Using Protégé-2000 to Edit RDF30 January 2001 |
Abstract:
For the past 15 years, the Knowledge Modeling Group (KMG) at Stanford University has developed a variety of knowledge-modeling tools as part of the Protégé project. The current knowledge-base editing tool, Protégé-2000, is an extensible, open-source application. An explicit goal of the KMG is to provide a knowledge-base editing platform that can easily be adapted to any well-defined frame-based modeling language while simultaneously enabling maximal code reuse (e.g., tools developed for a particular modeling language and application ought to be reusable with different modeling languages and applications). Since RDF Schema is a frame-based language, the core Protégé-2000 tools can be easily extended to acquire, edit, and maintain RDF knowledge bases. This document discusses the current support for RDF editing in Protégé-2000 and the tradeoffs and possibilities of implementing a complete set of RDF and RDF Schema editing features in Protégé.Copyright =A9 2000, 2001 Stanford University
Protégé-2000 is now available as free software under the open-source Mozilla Public License and Protégé-2000 is compatible with a wide range of knowledge representation languages. It provides an integrated knowledge-base editing environment and an extensible architecture for the creation of customized knowledge-based tools. In the remainder of this overview, we briefly discuss the key architectural ideas that underlie Protégé-2000, in order to motivate and simplify the rest of the document. This diagram may be helpful.
MiniVan is a class(alternatively, you can view these classes and instances in Protégé default user interface).
example_INSTANCE_0009 is an instance of MiniVan
registeredTo and rearSeatLegRoom are slots which have been attached to the class MiniVan
The value of the rearSeatLegRoom slot at example_INSTANCE_0009 is "40.2"
Protégé-2000 significantly simplifies the often-complicated task of developing an appropriate class hierarchy for a given application (or set of applications). The user can easily browse the class hierarchy, visualize class definitions, create new classes and slots, and bind slots to classes. This can be easily done in the Protégé-2000 ontology editor.
Structured data entry allows users to enter an instance quickly and easily (and to verify that the information that they have entered is correct). Protégé-2000 enables this capability through the use of forms-every class in Protégé-2000 is associated to a user-interface form, which can be customized in a number of ways (for example, by placing important information at the "top" of the form). The use of forms transforms the knowledge-base operation "acquiring an instance" into the user-interface operation "filling out the blanks on the form."
The complete editing cycle is therefore the following: define a concept, layout the associated form, and use the form to acquire instances.
Protégé-2000 defines a set of custom user-interface components (widgets) that know how to acquire and display the value of a slot on a particular class or instance. In this example, FloatFieldWidget is being used to acquire the rearSeatLegRoom of an instance of MiniVan. FloatFieldWidget not only displays the value of the slot, but also it allows the user to edit the value, and can perform some simple validation checks on what the user has typed.
System developers use a well-defined API to implement separate widgets defined as separate components that interact with the core Protégé framework. This component architecture enables the creation of collections of user-interface devices that are specialized for acquiring certain types of knowledge (e.g., widgets that are appropriate for slots with certain value types). In the following example, a user decides to use a SliderWidget instead of a FloatFieldWidget widget to acquire (and view) a floating-point value.The user views the automatically generated form, chooses to use a SliderWidget instead of a FloatFieldWidget, and uses it to browse the knowledge base.
An additional benefit of this layer of indirection is that it provides a convenient location for translation code. For example, we implemented the RDF storage layer to import RDF files to Protégé and store Protégé knowledge bases in RDF. This layer, in addition to saving the knowledge base as an RDF document or creating a knowledge base from an RDF document, also performs the necessary interpretation and translation.
1.4 Conclusions
While there are some minor distinctions between the Protégé knowledge model
and the knowledge model used in RDF, the differences are easy to identify and to
overcome, using a few simple mapping conventions that we adopted in the
persistence layer.
Feature | RDF and RDF Schema | Protégé-2000 |
Multi-class membership | A resource can be an instance of one or more classes | An instance can have only one direct type |
Range constraints | The value of the range property is a single Class which constraints the value of the corresponding property to instances of that class | A value of a slot can be a value of a primitive type or an instance of a class. There can be one or more classes that constrain the value |
Containers | There are three types of container objects: bag, sequence, and alternative | Collections have to be encoded, e.g. by ordered lists |
Namespaces | Frame names are unique within one schema; for multiple schemas, the XML namespace facility is used to associate each property with the schema | Frame names are unique within one project. Name conflicts are not resolved during project inclusion. |
Literal markup | A literal may have content that is XML markup but is not further evaluated by the RDF processor or it can be a primitive datatype defined by XML | Literals can be either plain strings, numbers, symbols, or boolean values |
RDF permits "multi-class membership" or "complex-entity types": resources may be instances of several classes C1, ..., Cn. In Protégé-2000, each instance has only one direct type because of user-interface considerations. (If there is no single class that is a parent of that instance, then there is no place to edit the complete form for an instance.) When a user needs to create a resource that is an instance of several classes C1, ..., Cn, the RDF-editor layer of Protégé-2000 could simulate the multi-class membership by automatically creating a new class C as a subclass of C1, ..., Cn and then by creating an instance of class C. This solution was used for creating an interface between Protégé and Loom. For the RDF support in Protégé-2000 version 1.5, only one type is picked, and an error message is generated.
4.2 Core constraints: rdfs:domain and rdfs:range
In Protégé-2000, slots (properties) are linked to classes through slot attachment (see Protégé-2000 knowledge model summary). For each slot S, the set of classes to which S is attached can be viewed as the domain of slot S. This notion of slot domain is the same as rdfs:domain for properties in RDF.
Protégé-2000 and the RDF Schema handle ranges quite differently. RDF Schema, via the rdfs:range property, defines the range of a property to be instances of a single class. In Protégé-2000, on the other hand, slot ranges are defined using multiple properties. Each slot has both an associated primitive type (one of: integer, float, string, symbol, boolean, class, or instance) and additional constraints (depending on the primitive type) that allow the range to be more precisely specified. For example, the semantics of RDF Schema's rdfs:range property are exactly modeled by using the primitive type Instance and listing exactly one class in the "Allowed Classes" facet. Protégé-2000 allows the user to have more than one class in the "Allowed classes" list.
In order to comply with the one-range restriction in RDF, the RDF-editing layer picks the smallest common superclass of the classes in the allowed classes list (alternatively, it could create a new class which is a superclass of all of the intended allowed classes and use it as the range). This mapping can be done in the RDF storage model without the user ever being aware of it. Alternatively, users of Protégé-2000 can exercise self discipline and create the common superclass themselves.
RDF Schema defines three different types of containers for properties that have multiple values: Sequence, Alternative, and Bag (each of these container types has a different semantics). The Protégé knowledge model, on the other hand, does not have an explicit container type.
XML namespaces, which are basic to RDF and RDF Schema, are not currently supported in Protégé-2000 directly. However, the RDF persistence layer introduced the concept of namespace abbreviations, i.e., each frame that is not in the default namespace (which has to be specified when importing or saving a knowledge base) is prefixed with a namespace abbreviation. When saving a knowledgebase, the abbreviations are expanded into the full namespace URIs. This concept also allows included projects to use namespaces different from the namespace of the main project, thus avoiding name clashes.
RDF literals are "the most primitive value type represented in
RDF, typically a string of characters. The content of a literal is not
interpreted by RDF itself and may contain additional XML markup." RDF does not
yet define any other concrete data types like integer, float, date, ..., but
often the corresponding XML Schema
types are used (unofficially).
Therefore, there are two
issues in supporting the RDF notion of literals in Protégé: allowing XML markup
and supporting primitive data types. In the current implementation, XML Schema
datatypes are not automatically recognized on import. RDF Literal (and any
subclasses) are mapped to Protégé's String type when used as a
range.
XML markup for RDF literals is not currently
supported in Protégé-2000. The Protégé plug-in API allows implementing slot widgets
that display, acquire, and validate slot values of a specific type. Therefore
one can implement an XML-value widget that will acquire and validate the XML
input to ensure that it is well-formed and that the XML instance corresponds to
a specific DTD. These widgets can be implemented independently of the
RDF-editing layer.
|
|
|
Multi-class membership | Only one class is picked (plus error message). | Automatically generate the necessary intermediate classes. |
Range constraints | Smallest superclass is used. | Automatically generate intermediate classes which encapsulate the range constraints. |
Containers | Not supported. | |
Namespaces | Automatically prepend the namespace abbreviation to the frame name. | Support namespaces throughout the Protégé-2000 framework. |
Literals | Mapped to String. | Support XML Schema types if their use becomes official in RDF. |
The left-hand pane visualizes the class hierarchy in a tree. The right-hand pane summarizes the slots that are attached to the highlighted class. Each slot has cardinality (single or multiple) defining the number of possible values for the slot and value type defining the types of values. Depending on the value type, additional restrictions on the values can be specified using facets. For instance, if the value of a slot is an instance of another class, the allowed-classes facet contains the list of classes that the instances can come from.